For guiding the UAV swarm to pass through narrow openings, a trapezoid virtual tube is designed in our previous work. In this paper, we generalize its application range to the condition that there exist obstacles inside the trapezoid virtual tube and UAVs have strict speed constraints. First, a distributed vector field controller is proposed for the trapezoid virtual tube with no obstacle inside. The relationship between the trapezoid virtual tube and the speed constraints is also presented. Then, a switching logic for the obstacle avoidance is put forward. The key point is to divide the trapezoid virtual tube containing obstacles into several sub trapezoid virtual tubes with no obstacle inside. Formal analyses and proofs are made to show that all UAVs are able to pass through the trapezoid virtual tube safely. Besides, the effectiveness of the proposed method is validated by numerical simulations and real experiments.
translated by 谷歌翻译
Scene text spotting is of great importance to the computer vision community due to its wide variety of applications. Recent methods attempt to introduce linguistic knowledge for challenging recognition rather than pure visual classification. However, how to effectively model the linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting. Firstly, the autonomous suggests enforcing explicitly language modeling by decoupling the recognizer into vision model and language model and blocking gradient flow between both models. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for the language model which can effectively alleviate the impact of noise input. Finally, to polish ABINet++ in long text recognition, we propose to aggregate horizontal features by embedding Transformer units inside a U-Net, and design a position and content attention module which integrates character order and content to attend to character features precisely. ABINet++ achieves state-of-the-art performance on both scene text recognition and scene text spotting benchmarks, which consistently demonstrates the superiority of our method in various environments especially on low-quality images. Besides, extensive experiments including in English and Chinese also prove that, a text spotter that incorporates our language modeling method can significantly improve its performance both in accuracy and speed compared with commonly used attention-based recognizers.
translated by 谷歌翻译
Cross-view geo-localization aims to spot images of the same location shot from two platforms, e.g., the drone platform and the satellite platform. Existing methods usually focus on optimizing the distance between one embedding with others in the feature space, while neglecting the redundancy of the embedding itself. In this paper, we argue that the low redundancy is also of importance, which motivates the model to mine more diverse patterns. To verify this point, we introduce a simple yet effective regularization, i.e., Dynamic Weighted Decorrelation Regularization (DWDR), to explicitly encourage networks to learn independent embedding channels. As the name implies, DWDR regresses the embedding correlation coefficient matrix to a sparse matrix, i.e., the identity matrix, with dynamic weights. The dynamic weights are applied to focus on still correlated channels during training. Besides, we propose a cross-view symmetric sampling strategy, which keeps the example balance between different platforms. Albeit simple, the proposed method has achieved competitive results on three large-scale benchmarks, i.e., University-1652, CVUSA and CVACT. Moreover, under the harsh circumstance, e.g., the extremely short feature of 64 dimensions, the proposed method surpasses the baseline model by a clear margin.
translated by 谷歌翻译
现有的步态识别研究以实验室场景为主。由于人们生活在现实世界中,因此野外的步态识别是一个更实用的问题,最近引起了多媒体和计算机视觉社区的关注。在现有基准上获得最先进性能的当前方法在最近提出的野外数据集上的准确性差得多,因为这些方法几乎无法模拟不受约束场景中步态序列的各种时间动力学。因此,本文提出了一种新型的多跳时间开关方法,以实现实际场景中步态模式的有效时间建模。具体来说,我们设计了一个新型的步态识别网络,称为多跳临时交换机网络(MTSGait),以同时学习空间特征和多尺度的时间功能。与现有的3D卷积进行时间建模的方法不同,我们的MTSGAIT通过2D卷积对步态序列的时间动力学进行建模。通过这种方式,与基于3D卷积的模型相比,它以较少的模型参数来达到高效率,并减少了优化的难度。基于2D卷积内核的特定设计,我们的方法可以消除相邻帧之间特征的不对准。此外,提出了一种新的采样策略,即非环保连续采样,以使模型学习更强大的时间特征。最后,与最新方法相比,提出的方法在两个公共步态数据集(即增长和步态3D)上取得了出色的性能。
translated by 谷歌翻译
二进制代码相似性检测(BCSD)方法测量了两个二进制可执行代码的相似性。最近,基于学习的BCSD方法取得了巨大的成功,在检测准确性和效率方面表现优于传统的BCSD。但是,现有的研究在基于学习的BCSD方法的对抗脆弱性上相当稀疏,这会导致与安全相关的应用程序危害。为了评估对抗性的鲁棒性,本文设计了一种高效且黑色的对抗代码生成算法,即FuncFooler。 FuncFooler限制了对抗代码1)保持程序的控制流程图(CFG)和2)保持相同的语义含义。具体而言,funcfooler连续1)在恶意代码中确定脆弱的候选人,2)从良性代码中选择和插入对抗性指令,以及3)纠正对抗代码的语义副作用以满足约束。从经验上讲,我们的FuncFooler可以成功攻击包括Safe,ASM2VEC和JTRAN在内的三种基于学习的BCSD模型,它们质疑是否需要基于学习的BCSD。
translated by 谷歌翻译
为了在混乱的环境中引导多代理系统,连接的四边形虚拟管均设计供所有代理保持在其内部移动,其基础称为单梯形虚拟管。管子内部没有障碍物,即管内部的区域可将其视为安全区域。然后,为单个梯形虚拟试管传递问题提出了分布式群体控制器。该问题通过没有局部最小值的梯度矢量场方法解决。做出正式的分析和证明是为了表明所有代理都能够通过单梯形虚拟管。最后,为了方便实际使用,提出了一个修改的控制器。对于连接的四边形虚拟管,提出了修改的开关逻辑,以避免僵局并防止代理在虚拟管外移动。最后,通过数值模拟和实际实验来验证所提出方法的有效性。
translated by 谷歌翻译
机器人群系统现在对许多具有挑战性的应用越来越吸引人。任何机器人的主要任务是到达目的地,同时保持与其他机器人和障碍物的安全分离。在许多情况下,机器人需要在狭窄的走廊内移动,穿过窗户或门框。为了引导所有机器人在杂乱的环境中移动,在本文中仔细设计了没有障碍物的曲线虚拟管。管内部没有障碍物,即管内的区域可以被视为安全区。然后,提出了一种具有三个精细控制术语的分布式群控制器:线路接近项,机器人避免期限和管保持术语。正式分析和证据表明,可以在有限时间内解决曲线虚拟管通过问题。为方便起见,提出了一种具有近似控制性能的修改式控制器。最后,通过数值模拟和实验验证了所提出的方法的有效性。为了展示所提出的方法的优点,我们的方法和控制屏障功能方法之间的比较也在计算速度方面呈现。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Learning feature interactions is the key to success for the large-scale CTR prediction and recommendation. In practice, handcrafted feature engineering usually requires exhaustive searching. In order to reduce the high cost of human efforts in feature engineering, researchers propose several deep neural networks (DNN)-based approaches to learn the feature interactions in an end-to-end fashion. However, existing methods either do not learn both vector-wise interactions and bit-wise interactions simultaneously, or fail to combine them in a controllable manner. In this paper, we propose a new model, xDeepInt, based on a novel network architecture called polynomial interaction network (PIN) which learns higher-order vector-wise interactions recursively. By integrating subspace-crossing mechanism, we enable xDeepInt to balance the mixture of vector-wise and bit-wise feature interactions at a bounded order. Based on the network architecture, we customize a combined optimization strategy to conduct feature selection and interaction selection. We implement the proposed model and evaluate the model performance on three real-world datasets. Our experiment results demonstrate the efficacy and effectiveness of xDeepInt over state-of-the-art models. We open-source the TensorFlow implementation of xDeepInt: https://github.com/yanyachen/xDeepInt.
translated by 谷歌翻译